Fix CHAR/VARCHAR length overflow when writing reconcile intermediate data by moomindani · Pull Request #2428 · databrickslabs/lakebridge

moomindani · 2026-05-09T00:55:59Z

Changes

What does this PR do?

Strip CHAR(n)/VARCHAR(n) length constraints from DataFrames before writing intermediate data to Delta during reconciliation. This prevents DELTA_EXCEED_CHAR_VARCHAR_LIMIT errors when source data contains space-padded CHAR values.

Root cause

Some data sources (e.g., Teradata) return CHAR(n) values with space padding via JDBC, resulting in values that exceed the declared column length (e.g., a CHAR(16) column returning 16 digits + 16 spaces = 32 characters). Delta enforces CHAR/VARCHAR length constraints through column metadata (__CHAR_VARCHAR_TYPE_STRING), causing writes to fail for these padded values.

This was observed with Teradata via Lakehouse Federation but not with Lakebase (PostgreSQL) via Lakehouse Federation.

Fix

Strip all column metadata via col.alias(name, metadata={}) before writing intermediate DataFrames to Delta. This removes the constraint that Delta uses for length enforcement. The intermediate data is temporary and does not need metadata preservation.

Linked issues

Fixes #2389

Tests

manually tested with Teradata via Lakehouse Federation
added unit tests
added integration tests

Test plan

test_strip_char_varchar_constraints_strips_metadata — verifies CHAR/VARCHAR metadata is stripped
test_strip_char_varchar_constraints_preserves_types — verifies column types are preserved
All existing reconcile unit tests pass

Reopened from #2390 on an upstream branch to bypass the fork-PR OIDC restriction on JFrog auth (CI cannot run on fork PRs). All review comments and history are preserved on the original PR.

Some data sources (e.g., Teradata) return CHAR(n) values with space padding via JDBC, resulting in values that exceed the declared column length. Delta enforces CHAR/VARCHAR length constraints through column metadata (__CHAR_VARCHAR_TYPE_STRING), causing writes to fail for these padded values. Strip all column metadata via col.alias(metadata={}) before writing intermediate DataFrames to Delta. This removes the constraint that Delta uses for length enforcement. Observed with Teradata via Lakehouse Federation but not with Lakebase (PostgreSQL) via Lakehouse Federation. Co-authored-by: Isaac

…gth-overflow

- black reformats list comprehension to single line in test helper - ruff removes unused StringType import (was used in main, dropped after merge) Co-authored-by: Isaac

codecov · 2026-05-09T00:59:37Z

Codecov Report

❌ Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.78%. Comparing base (1c32cbb) to head (2ef8da0).
⚠️ Report is 10 commits behind head on main.

Files with missing lines	Patch %	Lines
...abricks/labs/lakebridge/reconcile/recon_capture.py	60.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2428      +/-   ##
==========================================
- Coverage   65.78%   65.78%   -0.01%     
==========================================
  Files          98       98              
  Lines        9237     9242       +5     
  Branches      992      992              
==========================================
+ Hits         6077     6080       +3     
- Misses       2984     2986       +2     
  Partials      176      176

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

`_write_df_to_delta` is a module-level function and accessed ReconIntermediatePersist._strip_char_varchar_constraints from outside the class, which pylint flags as protected-access. Rename to public since the helper is effectively a utility. Also rename mock_select unused arg to *_cols and fix test fn names. Co-authored-by: Isaac

github-actions · 2026-05-09T01:25:46Z

✅ 148/148 passed, 5 skipped, 24m59s total

_{Running from acceptance #4311}

m-abulazm

this looks like a deeper issue with teradata jdbc driver:

anyway the better fix right now would be to trim variable length columns in databricks/labs/lakebridge/reconcile/query_builder/expression_generator.py:250 for databricks which will apply to teradata foreign catalogs

m-abulazm · 2026-05-14T08:50:45Z

this looks like a deeper issue with teradata jdbc driver:

https://stackoverflow.com/questions/70596812/how-to-avoid-blank-spaces-while-loading-data-from-teradata-to-databricks

https://support.teradata.com/community?id=community_question&sys_id=ca9847a71b97fb00682ca8233a4bcb41

https://teradata-docs.s3.amazonaws.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_5.html#BGBJECGD

anyway the better fix right now would be to trim variable length columns in databricks/labs/lakebridge/reconcile/query_builder/expression_generator.py:250 for databricks which will apply to teradata foreign catalogs

also please investigate the current query that gets built. it should already have trim as it is specified as the universal transformation

moomindani added 3 commits April 24, 2026 16:33

Merge remote-tracking branch 'origin/main' into fix/recon-varchar-len…

a603b3b

…gth-overflow

Apply black/ruff formatting after main merge

dc79384

- black reformats list comprehension to single line in test helper - ruff removes unused StringType import (was used in main, dropped after merge) Co-authored-by: Isaac

moomindani requested a review from a team as a code owner May 9, 2026 00:56

moomindani had a problem deploying to tool May 9, 2026 00:56 — with GitHub Actions Error

moomindani mentioned this pull request May 9, 2026

Fix CHAR/VARCHAR length overflow when writing reconcile intermediate data #2390

Closed

6 tasks

moomindani temporarily deployed to tool May 9, 2026 01:20 — with GitHub Actions Inactive

m-abulazm assigned moomindani May 14, 2026

m-abulazm added the feat/recon making sure that remorphed query produces the same results as original label May 14, 2026

m-abulazm requested changes May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CHAR/VARCHAR length overflow when writing reconcile intermediate data#2428

Fix CHAR/VARCHAR length overflow when writing reconcile intermediate data#2428
moomindani wants to merge 4 commits into
mainfrom
fix/recon-varchar-length-overflow

moomindani commented May 9, 2026

Uh oh!

codecov Bot commented May 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

m-abulazm left a comment

Uh oh!

m-abulazm commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

moomindani commented May 9, 2026

Changes

What does this PR do?

Root cause

Fix

Linked issues

Tests

Test plan

Uh oh!

codecov Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

m-abulazm left a comment

Choose a reason for hiding this comment

Uh oh!

m-abulazm commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 9, 2026 •

edited

Loading